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ABSTRACT 


This  paper  studies  the  estimation  oi    coefficients  B  in  single  inde;-;  models 
such  that  E(y I  X) =F (a  +  X '6)  1  where  the  function  F  is  misspecified  or  unknown.  A 
linear  instrumental  variables  slope  coefficient  vector  of  y  regressed  on  X  is 
shown  to  be  consistent  for  0  up  to  a  scalar  multiple,  where  the  instruments 
&re    appropriately  defined  score  vectors  of  the  marginal  distribution  of  X.  The 
framework  is  illustrated  by  several  common  limited  dependent  variable  aodels 
and  models  involving  a  transformed  dependent  variable.   Similar  estimators  are 
indicated  for  multiple  index  models  and  models  where  extraneous  variables  are 
present.  The  construction  of  the  instrumental  variables  is  discussed,  and 
illustrated  by  several  examples.  The  asymptotic  distribution  of  the 
instrufliental  variables  estimator  is  established. 


CONSISTENT  ESTIMATION  OF  SCALED  COEFFICIENTS 


1.  Introduction 


In  this  paper  we  consider  the  generic  econometric  modeling  situation  in 
which  a  dependent  variable  y  is  modeled  as  a  function  of  a  vector  of 
independent  variables  X  and  stochastic  terms,  where  the  conditional 
expectation  of  y  given  X  takes  the  form  E(ylX)  =  Flw+XS).  This  situation 
exists  for  many  standard  models  of  discrete  choice,  censoring  and  selection, 
but  is  clearly  not  limited  to  such  models.  Our  interest  is  in  what  can  be 
learned  about  the  values  of  the  coefficients  g  without  specific  assumptions  on 
the  distribution  of  unobserved  stochastic  terms  or  other  functional  form 

aspects;  in  other  words,  when  the  true  form  of  the  function  F  is  misspecified 

1 

or  unknown. 

For  different  examples  of  limited  dependent  variables  models, 
Ruud(1983a),  Gol dberger ( 1981 ) ,  Deaton  and  Irish(1984)  and  Chung  and 
Goldberger (1984) ,  among  others,  have  studied  the  conditions  under  which  OLS 
regression  coefficients  and  other  quasi -maxinum  likelihood  estimators  will 
consistently  estimate  P  up  to  a  scalar  multiple.  Ruud(1983a)  points  out  that  a 
sufficient  condition  for  this  property  occurs  when  the  conditional  expectation 
of  each  component  of  X  given  Z  =  cx  +  XP  is  linear  in  Z,  which  is  valid,  for 
example,  when  X  is  multivariate  normally  distributed.  Goldberger (1981 ) ,  Deaton 

and  Irish(1984)  and  Chung  and  Gol dberger ( 1984 )  point  out  the  sufficiency  of  an 
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analogous  condition  with  a  more  general  definition  of  Z. 

An  intriguing  feature  of  this  work  is  that  it  provides  special  cases 

where  knowledge  of  the  marginal  distribution  of  X  is  very  useful  for 

estimating  behavioral  effects  when  certain  features  of  the  true  model  are 

unknown.  The  question  is  immediately  raised  as  to  whether  more  general  results 


D-f  this  type  can  be  obtained,  because  as  Ruudn983a)  states,  the  above 
suHicient  condition  is  "too  restrictive  to  be  generally  applicable."  Results 
which  apply  to  more  general  marginal  distribution  forms  are  of  substantial 
practical  interest  because  the  marginal  distribution  of  X  can  in  general  be 
empirically  characterized. 

The  purpose  of  this  paper  is  to  indicate  that  if  X  is  a  continously 
distributed  random  vector,  knowledge  of  the  marginal  distribution  of  X  in 
general  permits  consistent  estmation  of  8  up  to  a  scalar  multiple.  In 
particular,  we  show  that  such  a  consistent  estimate  can  be  obtained  as  slope 
coefficient  vector  of  a  linear  instrumental  variables  regression  of  y  on  X, 
where  the  instruments  are  appropriately  defined  score  vectors  from  the 
marginal  distribution  of  X.  The  ratio  of  any  two  components  of  this  slope 
vector  will  consistently  estimate  the  ratio  of  the  corresponding  components  of 

These  estimates  may  suffice  for  many  applications,  such  as  judging 
relative  marginal  utilities  in  a  discrete  choice  situation.  Moreover,  because 
the  asymptotic  distribution  of  the  instrumental  variables  estimator  is  easily 
established,  certain  scale  free  hypotheses  can  be  tested,  such  as  zero 
restrictions  and  equality  restrictions  on  the  components  of  B. 

More  broadly,  the  ratio  estimates  provide  a  consistent  benchmark  for 
choosing  specific  modelling  assumptions.  Namely,  if  alternative  functional 
■form  or  stochastic  distribution  assumptions  give  rise  to  substantively 
different  estimates  of  6,  the  consistent  ratio  estimates  can  guide  the  choice 
of  the  best  specification.  For  example,  in  a  binary  discrete  choice  situation, 
separate  estimates  of  P  under  logit  or  probit  assumptions  could  be  judged  in 
relation  to  the  consistent  instrumental  variable  ratio  estimates. 

The  exposition  begins  with  notation,  examples  and  assumptions  in  Section 
2.  The  main  result  on  consistency  of  the  instrumental  variables  slope  vector 


is  presented  in  Section  3.1,  with  immediate  extensions  to  more  general  models 
in  Section  3.2.  The  proofs  are  oi    some  potential  independent  interest  because 
they  utilize  the  results  o-f  Stoker  (1982 ,  1983)  in  a  novel  way.  Section  3.3 
presents  facilitating  results  on  the  construction  of  the  instrumental 
variables,  and  Section  3.4  establishes  the  asymptotic  distribution  of  the 
instrumental  variables  estimator.  Specific  examples  of  independent  variable 
distributions  Are    considered  in  Section  4,  where  the  relation  of  the  results 
to  the  previous  literature  is  discussed.  Section  5  contains  some  concluding 
remarks. 

2.  Notation  and  Basic  Assumptions 

We  consider  the  situation  where  data  is  observed  on  a  dependent  variable 
yk  and  an  M-vector  of  independent  variables  Xi,  for  f;  =  l,...,K,  where  M>2. 
(yk,Xk),  k=l,...,K  represent  random  drawings  froa  a  distribution  with  density 
Po(y,X)  =  q(ylX)po(X),  which  is  absolutely  continuous  with  respect  to  a  o- 
finite  measure  v.  po(X)  represents  the  density  of  the  isarginal  distribution  of 
X.  The  conditional  density  q(ylX)  represents  the  true  behavioral  econometric 
(Bodel ,  for  which  we  assume  that  the  conditional  expectation  E(ylX)  can  be 
written  in  the  form 

(2.1)     E(y  IX)  =  F(c<  +  X'e)  =  F(Z) 


for  some  function  F,  where  a  i s  a  constant,  0= (B i , . . . , Ph) '  an  M-vector  of 
constants,  and  Z  is  defined  as  Z=k+X'0.  We  refer  to  Z  as  an  index  variable, 
with  (2.1)  a  single  index  model.  This  framework  is  very  general,  subsuming 
■any  standard  limited  dependent  variable   models,  but  is  not  restricted  to 
such  models.  Before  proceeding  to  specific  examples,  it  is  useful  to  note  the 


following  generic  special  case  of  (2.1).  Suppose  that  Z*  is  a  general 

index  variable  such  that  Z*-Z  is  independent  of  X,  then  if  EiylZ")  = 

F*(Z*)  for  some  function  F",  (2.1)  is  implied.  This  implies  the  natural  result 

that  important  behavioral  variables  can  be  omitted  from  X  in  (2.1)  without 

affecting  our  results,  provided  that  the  omitted  variables  are  independent  of 

the  included  ones.  We  will  write  Z"  =  k  +  X'0  +  e  for  such  an  inde>:,  where  e  is 

4 
distributed  independently  of  X.   We  now  turn  to  some  specific  examples: 

Example  1:  Binary  Discrete  Choice 

Suppose  that  y  represents  a  di chotomous  random  variable  modeled  as 

y  =   1     if  e  >  -(a+X'S) 
=   0     otherwise 

Here  E  (y  I  X )  =F  (ot  +  X  '  P )  is  the  probability  of  y=l  given  the  value  of  i,  with  the 
true  function  F  determined  by  the  true  distribution  of  e.  If  e  is  distributed 
normally  with  mean  0  and  variance  a^,  then  the  faailiar  probit  model  results, 

with  F  («  +  X  '  B)  =*[  (cx  +  X  g  ) /o-)  ,  where  #  is  the  cumulative  nornal  distribution 
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function.  Logit  models,  etc.  can  easily  be  included. 

Example  2:  Tobit  Models 

Suppose  that  y  is  equal  to  an  index  Z*  only  if  Z"  is  positive,  as  in  the 

following  censored  tobit  specification 


y  =   (x  +  X  'B  +  e 
=   0 


if  E  >  -(a+X'0) 
otherwi  se 


Alternatively,  if  y=K+X  B+e  is  observed  only  when  e  >  -(a+X'B),  we  have  the 
truncated  tobit  specification. 


Example  3;  Dependent  Variable  Transformations 

Suppose  there  exists  a  function  g(y)  such  that  the  true  model  is  of  the  form 

g(y)  =  «+X 'B+E 

where  g(y)  is  invertible  everywhere  except  for  a  set  of  v-measure  0.  A 
specific  example  here  is  the  familiar  Box-Cox  transformation  where 


y^^*  =  a+X  B+E 


with  y^^''  =C(y'^-l)/X]  for  \^0,    y  *  ^' ^  =  ln(y)  for  X  =  0. 


These  examples  serve  to  illustrate  the  wide  spectrum  of  models  covered  by 
(2.1)  with  general  function  F,  and  many  other  single  index  examples  can  be 
found.  Multiple  index  models  are  considered  in  Section  3.2.  We  now  turn  to  the 
other  assumptions  required. 

Formally,  we  assume  that  X  is  continuously  distributed,  having  carrier 
set  Q   of  the  following  form: 

Assumption  1;  £i  i s  a  measurable,  closed,  convex  subset  of  R"  with  nonempty 
interior.  For  XedQ,  where  dQ    is  the  boundary  of  Q,    we  have 
F(a+X •8)po(X)=0  and  Xpo(X)=0. 


Assumption  1  allows  for  unbounded  X's,  where  ft=R"  and  dQ=0.  For  the  bounded 
case  Fpo  and  Xpo  vanish  on  the  boundary,  which  is  obviously  implied  if  po 
vanishes  on  the  boundary.  While  the  majority  of  the  results  employ  Assumption 
1,  the  incorporation  of  discrete  (qualitative)  independent  variables  is 
discussed  in  Section  3.2. 

The  main  regularity  condition  on  the  behavioral  model  is 

Assumption  2:  F(Z)  is  di  f  f  erent  i  able  for  all  Z=o<  +  X'6,  where  XeH,    and  ft  differs 

from  ft  by  a  set  of  v-measure  0. 
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For  technical  reasons  we  will  utilize  the  translation  family  generated  by 
Po(X),  defined  as  TT  =  {p(Xie)}       where  plXIS)  =  po<X-e)  is  defined  on 
Q  (6) ={X  +  e  I  Xeft}  ,  with  B  a  compact  subset  of  R"  with  interior  point  6=0.  We  set 
P(y,X)e)  =  q(y I X)p (X  I  8)  ,  so  that  P (y , X  I  0) =Po (y , X ) ,  p(XIO)=po(X)  and  Q(0)=Q.  We 
assume 


Assumption  3:  P(y,XI6)  is  twice  di f f erenti abl e  in  8  for  all  SeQ.  The  means, 

variances  and  covariances  of  y,  X  and  the  score  vector  .1        exist 

8 

for  all  SeB,  where 


(2.2) 


ainP(y.X 18) 
^8   "  38 


The  matrix  E(.ll  X')  is  nonsingular  for  all  BeB. 
o 


Assumption  3  clearly  guarantees  the  existence  of  the  means,  variances  and 
covariances  of  y,  X  and  A,©  for  8=0,  the  data  set  moments.   Note  that  Jlo  can  be 
written  as: 


(2.3)   ilc 


Sin  p(X I  0) 
38 


81n  po(X: 
8X 


If  we  denote  the  mean  of  y  for  each  SeB  as 


(2.4)   E(y)  f  ♦•(8)  =  /  yP(y,Xie)dv 


then  we  assume 


Assumption  4:  <^'i&)    is  di  f  f  er  ent  i  abl  e  for  all  e«8,  with  nonzero  derivative  at 
8=0. 


Finally,  we  give  Assumption  5  in  the  Appendix,  which  is  a  purely  technical 
regularity  assumption  that  assures  that  derivatives  may  be  taken  under 


expectations.  While  somewhat  formidable  technically,  these  assumptions  are 
collectively  very  weak, 

3.  Consistent  Estimation  of  Scaled  Coefficients 


In  this  section  we  consider  the  slope  estimates  of  the  linear  equation 

(3.1)       Vk  =  c  +  Xk 'd  +  Uk 

obtained  by  instrumenting  with  (l,.lloi.'),  where  A,o^  is  the  score  vector  (2.3) 
evaluated  at  X^.  The  slope  coefficients  d  =  (di,...,dM)'  can  be  written 
explicitly  as 


(3.2) 


d   =   ( Sox  )   *  Soy 


where  Sox  =  E.How  (X^-X )  ' /K  and  Soy  =  EJlok  (yk-y ) /K  are  the  relevant  sample 
covariance  matrices.  In  Section  3.1  we  establish  the  main  result  that  d  is  a 
strongly  consistent  estimator  of  B  up  to  a  scalar  multiple.  In  Section  3.2  we 
extend  the  result  to  more  general  models  with  extraneous  independent  variables 
and  several  index  variables.  In  Section  3.3  we  discuss  how  to  construct  the 
instruments  jlok  in  applications,  and  in  Section  3.4  we  establish  the 
asymptotic  distribution  of  d  for  statistical  inference. 

3. 1  The  Main  Result 


We  begin  by  showing 

A 

Theorem  1;  Under  Assumptions  1,  2,  3,  4  and  5,   lim  d  =  Tg  a.s.,  where  T  is  a 

,   .  K-»oo 

nonzero  constant. 

Proof:  We  first  consider  the  unbounded  case  where  Q=F"  and  dO=0.  Begin  by 

reparameterizing  the  translation  family  TT  by  E(X)  f  m  =  Ho+6,  where  mo=Eo(X) 

is  the  population  mean  of  X  in  the  data,  and  define  E(y)  =   ^(w)    =   4>*(i-'-Po). 


Since  Q(e)=0  for  all  btB  in  the  unbounded  case,  by  a  direct  application  oi 
Theorem  2  oi    Stoker(1983)  we  have  that 


(3.3) 


liffl  d 


8<»<Po) 


8f(0) 
38 


a.  s. 


Hhere  the  latter  equality  follows  fro*  the  definition  of  p.  The  reeult  follows 
from  computing  the  latter  derivative.  By  a  change  of  variables  to  x  =  X-6,  we 
have  that 


(3.4) 


E(y)  =  4)*(e)  =  ;  F(a+(x  +  e)  'B)po(>!)dv 


Now,  from  Assumptions  2  and  5,  we  differentiate  (3.4)  as 


(3.' 


^^  -  /  —  po(x)dv  =  /  —  8  Po(x)dv 


[;|f  po(x)dv] 


8F 


where  rr  i  s  evaluated  at  Z  =  a  + (>;+6)  '  B.  The  result  follows  from  evaluating  (3.5) 
d  I 

BF 
at  9=0  and  inserting  into  (3.3),  where  T  =  S  "tt   Po(X)dv,  and  Z  =  a+X'B.  T  is 

nonzero  by  Assumption  4. 

The  bounded  case  where  d0^8  follows  from  a  very  careful  consideration  of 

the  applicability  of  Theorem  2  of  Stoker(1983)  to  this  problem.  Theorem  2 

applies  only  when  the  carrier  set  does  not  vary  with  8,  and  so  (3.3)  is  no 

longer  immediately  valid,  because  ft(8)#fi  when  ei*0.  The  structure  of  the 

3*" 
derivative  r^  in  this  case  can  be  written  as 
do 


(3.6)     tV''^^   =  -|^  /  F(a  +  X'B)p(X  I8)dv  +  ~  J"    F(O(  +  X'0)  po(X)dv 
^®       ^®  a  ^^   SUB) 

where  each  term  is  evaluated  at  8=0.  The  first  term  is  the  derivative  of  4i*(e) 
holding  the  carrier  set  ^(8)  constant  at  ft(0)=Q,  while  the  second  term  is  the 
derivative  of  ♦*(8)  holding  the  integrand  constant  at  F (a+X ' 0 ) po (X)  while 
varying  the  carrier  set.  By  repeated  application  of  Fubini's  Theorem  and  the 
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Fundamental  Theorem  of  Calculus,  the  second  term  reduces  to  integrals  of  Fpo 
over  boundary  points  Xed^,  so  it  vanishes  by  Assumption  2.  Now,  what  the  proof 
of  Theorem  2  of  Stoker ( 1983)  actually  shows  is  that  lim  Soy  is  equal  to  the 


•first  derivative.  By  an  analogous  argument,  we  have  that  lim  Sox 


3t: 

38 


(an  Mxtl  identity  matrix),  so  that  equation  (3.3)  is  shown  to  be  valid  in  this 
case  also.  Consequently,  the  result  that  lim  d=TP  a.s.  follows. 

BED 


The  technique  of  the  above  proof  is  quite  nonstandard,  and  possibly  of 
some  independent  interest.  The  results  of  Stoker(1983)  (and  the  predecessor 
Stoker(1982))  connect  the  large  sample  limits  of  linear  instrumental  variables 
regression  coefficients  to  the  aggregate  effects  induced  by  distribution 
changes,  or  changes  in  the  sample  configuration.  The  above  proof  exploits 
these  results  by  considering  the  implied  aggregate  impact  of  a  specific  (but 
artificial)  type  of  distribution  change.  Namely,        gives  the  local  effect 

OS 

on  E(y)  of  varying  the  density  of  X  within  the  translation  family  TT.  This 

effect  is  seen  to  be  consistently  estimated  by  d.  The  desired  property  of  d  is 

then  established  by  calculating  the  value  of  the  aggregate  effect  via  (3.5). 

This  technique  of  proof,  namely  to  perturb  the  sample  distribution  and  then 

trace  the  aggregate  implications,  may  be  useful  in  other  contexts. 

The  reason  that  the  translation  family  J]   works  for  this  problem  is  that 

changes  in  the  implied  marginal  distributions  of  Z=o<+X'B  are  determined 

locally  in  a  neighborhood  of  8=0  by  changes  in  the  parameter  B'g.  In  fact, 

this  feature  provides  a  characterization  of  the  scalar  T.  Namely,  if  we  denote 

the  Bean  of  the  index  2  as  n  =  E(Z)  =  «+ (ho+S) '  g,  then  (3.5)  is  seen  to  be 

8<{>'(0) 


a  chain  rule  formula  where  T  = 


ar, 


,  so  that  T  IS  interpreted  as  the  change 


in  E(y)  induced  by  a  change  in  the  mean  of  the  index  E(Z)  under  density 
translation. 


3. 2  Immediate  Extensions  -  Extraneous  Variables  and  Multiple  Index  Models 

The  logic  of  the  above  proof  can  be  immediately  applied  to  more  general 
Bodeling  circumstances  than  provided  by  (2.1),  which  we  outline  below.  For 
this  section  (only),  we  expand  the  notation  slightly  to  consider  two  sets  of 
independent  variables;  an  Mi>2  vector  Xi  and  an  Mj  vector  Xj.  Suppose  that  the 
behavioral  model  for  y  implies  that  the  conditional  expectation  of  y  given  X, 
and  Xa  15  of  the  form 


(3.7) 


E(y  I  X,  ,Xj)  =  F((x,  +  Xi  'Bi.Xa) 


for  some  function  F  and  constant  coefficients  od  and  Bi.  (3.7)  just  adds  the 
extraneous  variables  X^  to  the  model  (2.1).  We  assume  that  (Xi',X2')'  is 
distributed  with  density  po(Xi,X2). 

It  is  easy  to  see  that  if  Xi  is  continuously  distributed  and  Xj  and  Xz 
have  no  common  components,  then  knowledge  of  Po(Xi,X2)  allows  consistent 
estimation  of  6i  up  to  a  scalar  multiple.  In  particular,  reinterpret 
Assumptions  1  through  5  to  apply  to  Xi  (defining  the  translation  family  with 
respect  to  Xi  only),  define  the  generalized  score  vector  as 


(3.8) 


_   ain  Po(X^,X2) 
^'  "  "  3X» 


and  consider  the  slope  coefficient  estimates  di  of  the  linear  equation 


(3.9) 


ye  =  Cj  +  Xmd,  +  U, 


obtained  by  instrumenting  with  (l,Jlii.')'-  By  reinterpreting  the  proof  of 

''■  3F 

Theorem  1,  we  have  that  lini  d|  =  Tipj  a.s.,  where  Ti  =  J'r7-  po(Xt,X2)dv, 

Zi=ai+Xi  Bi. 

This  result  indicates  that  extraneous  variables  are  acconodated  in  the 
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above  analysis  through  their  impact  on  the  instruments  Jin,.  The  variables  Xz 
could  be  ignored  i -f  .(li  o-f  (3.8)  did  not  depend  on  the  value  of  Xa,  for  which  a 
sufficient  condition  is  that  Xa  is  distributed  independently  of  Xi  (as 
indicated  in  the  discussion  of  a  generalized  index  Z*  of  Section  2). 

The  extension  permits  the  analysis  of  two  codinionly  encountered  practical 
situations  which  were  not  previously  treated.  The  first  occurs  when  the 
variables  X2  are  qualitative  variables,  not  continously  distributed.  The  above 
result  says  that  when  the  qualitative  variables  X2  are  not  independent  of  the 
continuous  variables  Xi,  the  coefficients  81  of  the  continuous  variables  can 
be  consistently  estimated  up  to  a  scalar  nultiple  by  the  instrumental 
variables  regression  (3.9).  The  instruments  Jin,  in  this  case  are  just  the 
score  vectors  of  the  distribution  of  Xi  conditional  on  the  value  of  X2, 
evaluated  at  Xi=Xik  and  X2=X2k. 

The  second  practical  situation  occurs  when  the  behavioral  model  employs 
several  index  variables.  Suppose  that  M2>2  and  that  the  conditional 
expectation  (3.7)  can  be  written  in  the  two  index  form 


13. 10) 


E(ylX,,X2)  =  F(o<,  +  X, 'Bj,o;2+X2'B2)  =  F(Z,,Z2) 


where  Zj=Oi  +  Xi'Bi  and  Z2=o<2  +  X2 '  62.  As  above,  when  Xi  is  continuously 
distributed  and  Xj  and  X2  have  no  variables  in  common,  the  slope  coefficients 
di  consistently  estimate  Tjgj.  Moreover,  if  X2  is  continuously  distributed, 
then  the  same  argument  can  be  applied  to  estimating  Ba  up  to  scale.  Formally, 
reinterpret  Assumptions  1  through  5  to  apply  to  X= (Xi  ' , X2 ' ) ' ,  noting  that 
Assumption  3  implies  that  no  linear  combination  of  the  components  of  Xi  is 
perfectly  correlated  with  any  linear  combination  of  the  components  of  X^.  If 
we  define 


(3.11) 


Jl= 


31n  Po(Xi .Xa) 
dXz 
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and  dz    as  the  estimated  slope  coeHicients  o-f  the  linear  equation 


(3.12) 


Vk  =  Ca  +  Xzk  da  +  U: 


obtained  by  instrumenting  with  (l,il2k')',  then  we  have  that  lim  d2=T2B2  a.s., 

3F 
where  T2=J'  ^-.  Po  <X|  ,  Xg)  dv,  with  Z2=a2  +  X2'0a. 
d  1 2 

Equi  valent  1  y ,  we  can  set  X  =  (X,',X2')'  and  perforni  the  single  regression 
(3.1);  here  repeated  as 


(3. 13) 


yk  =  c  +  Xk  d  +  u» 


where  (l,Jlok')'  =  ( 1  ,ili  k '  ,  Jt2k)  '  is  used  as  the  instrumental  variable.  From  the 
above  development,  we  clearly  have  that  lin  d'  =  (TiPi ' ,T2B2 '  )  a.s.  Thus  the 
coeHicients  of  both  index  variables  Zj^od  +  Xt'Bi  and  Z2=K2+X2'62  can  be 
estimated  up  to  scale  when  the  true  function  F  of  (3.10)  is  unknown.  It  should 
be  noted  that  the  scale  factors  Ti  and  T2  will  in  general  not  be  equal,  so 
that  only  the  ratios  of  components  of  Bi  or  ratios  of  components  of  B2  sre 
consistently  estimated.  Ratios  of  a  component  of  0,  to  a  component  of  B2    are 
not  identified  by  d. 

A  standard  example  of  a  two  index  model  obeying  (3.10)  is  the  selection 
bias  model  studied  by  Heckman ( 1979) ; 

Example  4;  Selection  Bias 

Suppose  that  y  is  equal  to  an  index  Z,*,  as  in 

y  =  cx,  +  X,  Bi  +  Ei 


but  that  y  is  observed  only  if  a  second  index  Z2"  =  cx2+X2 '  P2+E2  is  positive. 
We  assume  that  (ei,E2)  is  distributed  independently  of  (X,,X2).  Thus,  the 
conditional  expectation  of  y  given  Xi  and  X2  is 


E(y  I  X,  ,X2,Z2">0)  =  Ki  +  X.ei  +  E(e,Ie2  >  -(Ka+Xagz)) 

=   F  (0(l+X,  'e,  ,«2+X2'B2) 

so  that  the  structural  parameters  0i  and  the  selection  parameters  02  can  be 
estimated  up  to  scale  without  explicit  assumptions  on  the  joint  distribution 
of  (Ei,E2).  Notice  that  in  this  example  Ti=l,  so  that  lim  di=6i  a.s. 

By  comparing  Example  4  and  the  truncated  tobit  specification  of  Example  2,  we 
see  that  selection  parameters  can  be  estimated  up  to  scale  in  two  polar 
situations,  namely  when  the  selection  index  l-z    has  no  variables  in  comson  with 
the  structural  index  Zi,  or  when  the  selection  index  Zz  is  equal  to  the 
structural  index  Zj.  Moreover,  it  is  easy  to  verify  that  if  there  is  a  common 
variable  appearing  in  both  Zj  and  Z2,  then  the  large  sample  limit  of  the 

corresponding  component  of  d  of  (3.2)  is  the  sum  of  the  corresponding 

8 
components  of  Tipj  and  T2B2. 

The  above  discussion  has  focused  on  two  index  models;  clearly  analogous 

results  can  be  obtained  for  models  with  three  or  more  index  variables.  While 

we  now  return  to  the  notation  and  framework  of  Section  3.1,  all  of  the 

following  econometric  results  can  be  reformulated  for  the  above  estimators 

wi thout  di  f f i  cul ty. 

3.2  Construction  of  the  Instruments 


In  this  section  we  discuss  the  empirical  construction  of  the  instrumental 
variables  Jlok.  There  are  two  cases  in  which  application  of  Theorem  1  is 
particularly  easy.  The  first  is  when  the  density  po(X)  is  known  exactly,  so 
that  Jloi,  can  be  computed  directly  from  (2.3),  and  evaluated  at  X  =  Xk. 
Unfortunately,  this  case  is  never  likely  to  be  met  in  practice.  The  second 
case  occurs  when  the  form  of  the  density  is  congenial  in  that  Jl©  is  exactly 
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colllnear  with  X  or  some  known  function  of  X.  We  discusE  this  case  in 
conjunction  with  the  normal  distribution  examples  in  Section  4. 

In  general  applications  the  above  circumstances  will  not  be  valid,  and 
the  score  vectors  kot,    will  have  to  be  estimated.  Here  we  assume  that  the 
marginal  distribution  of  X  is  modeled,  with  the  density  po(X)  known  up 
to  some  estimable  parameters,  and  establish  that  the  natural  estimates  of  the 
score  vectors  Jlok  provide  valid  instruments.  We  then  briefly  raise  the 
prospect  of  estimating  Jlow  nonpar ametri cal  1  y. 

Suppose  that  the  marginal  density  is  assumed  to  lie  within  a  parametric 
family  p"(XIA),  so  that  po(X)  =  p*(XIAo),  where  A  is  a  finite  vector  of 
parameters  which  can  contain  the  mean,  variances  and  covariances,  etc.  of  X, 
with  true  value  A=Ao.  We  make  Assumption  6  of  the  Appendix,  which  assumes 
that  p*  is  twice  di f f erenti able  with  respect  to  the  components  of  A  as  well  as 
some  other  regularity  properties. 

The  application  of  Theorem  1  now  proceeds  in  two  steps.  First  obtain  any 
strongly  consistent  estimate  A  of  A=Ao  using  the  data  X^,  k«l,...,K. 
Standard  goodness  of  fit  tests  can  easily  be  performed  at  this  stage  to  assure 
the  suitability  of  the  assumed  parametric  form  p".  Next  construct  estimates  of 
the  score  vector  kot,    for  each  k  =  l,...,K  by  evaluating  (2.3)  at  Xw  and  A  as 

(3.14)  l^^    .   -   ^JLlLllll^  ,  =  , K 

A  A  A 

and    form   the    instruoiental    variable   estiroator    d*    «    (d  t* , . . .  ,  dw*)  '    o-f    (3.1) 

A 

using   Jlok    as    in 


(3.15) 


d'     =      (Sox)-'Soy 


where  Sox  and  Soy  are  the  sample  covariance  matrices  between  JLoi.  and  Xi,  and  y^ 
respectively.  The  justification  of  this  procedure  is  formalized  as 
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Theorem  2:  Under  Assumptions  1,  2,  3,  4,  5  and  6,    1  i m  d"  =  Tp  a.s. 

K-*oo 

Proof:  A  direct  application  of  Theorem  1  above  and  Theorem  7  of  Stoker ( 1983) , 

reinterpreted  to  apply  to  the  elements  of  A.  QED 

While  Theorem  2  permits  the  implementation  of  Theorem  1  in  applications, 
it  relies  on  specific  modeling  of  the  independent  variable  distribution,  A 
question  of  significant  practical  importance  concerns  whether  the  score 
vectors  .llok  can  be  nonparametr  i  cal  1  y  estimated,  because  then  specific  modeling 
assumptions  on  po(X)  would  not  be  necessary.  A  number  of  natural  methods  for 
such  nonparametric  estimation  come  to  mind,  such  as  to  use  an  adaptive  score 
estimate  of  the  type  proposed  in  Stone(1975),  Bickel(1982)  and  Manski < 19B4) . 
Unfortunately,  to  the  author's  knowledge,  no  results  are  available  on 
nonparametric  score  vector  estimation  for  multivariate  distributions,  as  the 
above  papers  are  concerned  only  with  univariate  distributions.  Consequently, 
this  topic  is  mentioned  because  of  its  natural  importance,  but  relegated  to 
future  research. 

3.4  Scale  Free  Inferences  on  0 


The  above  results  establish  the  strong  consistency  of  the  instrumental 

A  A 

variables  coefficients  d"  (and  d)  as  an  estimator  of  TB.  In  this  section  the 

asymptotic  distribution  of  d"  (and  d)  is  established,  which  allows  scale  free 

hypothesis  tests  on  the  true  value  of  P  to  be  carried  out.  Examples  of  scale 

free  hypotheses  include  zero  restrictions  (Bj=0),  equality  restrictions 

<Bi=Bj)  and  ratio  restrictions  (Bi/Bj=c).  Because  the  data  on  observed 

variables  and  instruments  represent  i.i.d.  drawings,  the  asymptotic 

distribution  of  d"  can  be  established  by  very  standard  methods.  Ne  sketch  the 

argument  below,  which  is  just  the  appropriate  specialization  of  the  results  of 
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Whitef 1980, 1982) ,  among  others. 

Consider  the  generalized  setting  where  an  M-vector  of  variables  Wi,  is 
observed,  so  that  the  full  set  of  observations  (yk,Xk  ,Wh'),  k=l,...,K, 
represents  a  random  sample  from  a  joint  distribution  of  y,  X  and  U.  Of 
interest  is  the  asymptotic  distribution  of  the  instrumental  variables 
estimator  dw   of  equation  (3.1)  obtained  by  instrumenting  nith  (l,Wk')', 
defined  as 


<  3 .  1 6 1 


dw  =   (Swx)   'Sw, 


where  S„x  =  S ( Ww-W) ( X^-X ) ' /K  and  S„y  =  E (W^-W) (y^-y ) /K  are    the  relevant  sample 
covariance  matrices.  We  collect  a  sufficient  set  of  regularity  conditions  for 
the  following  results  on  dw  as  Assumption  IV  in  the  Appendix.  Assumption  7 
lists  the  requirements  not  covered  by  Assumptions  1-6  for  the  specific 
application  of  this  paper. 

If  we  define  Sw  =  (2wx)"'2wv,  where  Ewx  and  Zwy  are  the  covariance 
matrices  between  W  and  y,  X  respectively,  then  clearly  lim  dw  =  Sw  a.s..  If  we 
in  turn  define  Uwk  =  (y^-y )  -  (Xh-X )  '  Sw,  then  we  have  immediately  that  Eo(uwi.>=0, 
lim  i:(Wk-W)Uwv/K  =  0  a.s.,  and 

(3.17)      /K(dw  -  Sw)  =  (Swx)-'  ^Jlll''""^""'' 

By  applying  plim  Swx  =  Ewx,  and  the  Central  Limit  Theorem  to  the  second  term, 
we  have 

Theorem  3;  Under  Assumption  IV,  as  K-»oo,  /K(dw-6w)  is  asymptotically  normal  with 
mean  0  and  covariance  aatrix  Vw  =  (  Ewx ) ~ '2wu ,wu ( Ewx ' ) "' i  where 
Ewu.wu  is  the  covariance  matrix  of  (W-E(W))uw,  with 
Uw=(y-E(y))-(X-E(X)) ■ Sw. 
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Following  Whi te (1980 , 1982)  ,  V„  is  consistently  estimated  by 

Vw=(Swx)"*Swu,wu(Swx')~\  where  Suu,wu  =  EnWk-W)(Wk-W)'Uwk^]/K,  and 

■'■■        _       _   -'•■  '■■ 

Uwk=  (y  i.-y )  -  <  Xi,-X )  '  dw  is  the  estimated  residual  -from  (3.1)  using  dw 

Theorem  3  establishes  the  asymptotic  distributions  -for  all  of  the  linear 

coefficients  estimators  studied  in  Stoker (1982, 1983)  .  For  d  of  (3.2)  above, 

Theorem  3  is  applied  by  setting  Wi,  =  Jlok  to  yield 

Corollary  4;  Under  fissumptions  1,  2,  3,  4,  5  and  7,  as  K-»oo,  /K(d-Tg)  is 

asymptotically  normal  with  mean  0  and  covariance  matrix  V=Eou,ou, 
where  Sou.ou  is  the  covariance  matrix  of  A-oUo,  with 
uo=(y-Eo(y) )-(X-Eo(X) ) 're. 

Proof:  Lemma  1  of  Stoker(1983)  implies  that  lim  Sox  is  the  Jacobian  matrix  of 
E(X)=Po+6  with  respect  to  6,  which  is  the  MxM  identity  matrix.  Theorem  3  above 
then  yields  the  result.  QED 

Following  Theorem  2  for  d",  since  plim  Sox  =  plim  Sox,  we  also  have 

Corollary  5:  Under  Assumptions  1,  2,  3,  4,  5,  6  and  7,  as  K*m,  /K(d"-Te)  is 
asymptotically  normal  with  mean  0  and  covariance  matrix  V. 


As  above,  the  asymptotic  covariance  matrix  V  is  consistently  estimated  by 

V  =  Sou.ou  =  E[  (.iokA-oi.  '  )  Uoi.^]/K  or  V*  =  (Sox  )  ~ 'Sou.  ou  (Sox  '  )  "  ' ,  where  Uoi.  is  the 

estimated  residual  from  (3.1)  with  coefficients  d*;  namely 

uoi.  =  (yk-y)-(Xk-X)  d*. 

Corollaries  4  and  5  establish  the  asymptotic  distribution  required  for 

testing  hypotheses  on  the  value  of  TB.  This  facilitates  the  testing  of  certain 

hypotheses  on  the  value  of  g,  which  are  scale  free  in  that  they  are  unaffected 

by  the  true  value  of  T.  For  example,  if  1  is  an  M-vector  of  constants,  the 

linear  restriction  l'B=0  is  equivalent  to  l'(TB)=0,  under  which  the  test 
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statistic  Id*  15  asymptotically  normal  with  mean  0  and  variance  I'Vl. 
Therefore,  by  choosing  appropriate  values  of  1,  tests  of  zero  restrictions 
(such  as  Bj=0)  and  equality  restrictions  (such  as  Bi=6j)  can  be  carried  out 
using  d*  and  V. 

Tests  on  the  value  of  a  nonlinear  di f f erentiabi e  function  of  Tg  can  be 
derived  by  the  "delta"  method  in  the  usual  way.  As  an  example,  for  testing 
whether  the  ratio  8i/Bj  is  equal  to  a  specific  value,  we  have  that 
/K[ (di  •/dj") - (01 /B j)  ]  is  asymptotically  normal  with  mean  0  and  variance  cjij, 
where 


(3. 16) 


1      ^  d.a 

ai,  =7^  v..  +~  V,, 


'7?^- 


where  di=TBi  and  Vij  is  the  i,j  element  of  V.  CTij  is  consistently  estimated  by 
evaluating  (3.10)  using  the  appropriate  components  of  d*  and  V. 

4.  Independent  Variable  Distribution  Examples  and  Related  Discussion 

Here  we  present  several  examples  based  on  specific  forms  of  the  marginal 
distribution  of  X,  to  illustrate  the  structure  of  the  instruments  ilok  and 
relate  our  results  to  the  previous  literature. 

4,1  Multivariate  Normal  Distributions 


As  indicated  above,  the  implementation  of  Theorem  1  is  particularly  easy 

A 

if  the  score  vector  Jlo  is  exactly  collinear  with  X,  for  then  d  is  the  vector 
of  OLS  slope  coefficients  of  yk  regressed  on  Xk,  k=l,...,K.    It  is  easy  to 
verify  that  this  situation  will  occur  if  and  only  if  po(X)  is   of  the 
•  ul  ti  van  ate  normal  form  over  0,  as  follows.  Suppose  that  Jlo  can  be  written  in 
the  form 
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(4.1) 


Jlo  =  A  +  BX 


where  A  is  an  M-vector  and  B  an  MxM  matrix  of  constants,  where  without  loss  of 
generality,  B  is  symmetric  and  nonsingular.  Since  Eo(Jlo)=0,  we  have  that 
A=-B(Eo(X)  )=-Bi-'o.  Now,  in  view  of  (2.3)  we  integrate  (4.1)  with  respect  to  X, 
which  implies  that  In  po(X)  must  be  of  the  form 

(4.2)      In  po(X)  =  C  -  (1/2)  (X-Mo)  'B(X-i-'o) 

for  some  constant  C,  which  for  E"'=B  clearly  implies  that 


(4.3)     po(X) 


l(Xeft) 


J-  pN(XI|Jo,i:)dv 
ft 


Pn(X  I  |-'o,E) 


where  Pn(XI|-'o,E)  is  the  multivariate  normal  density  with  mean  Mo  and 
covariance  matrix  E,  and  l(Xeft)  is  the  indicator  function  of  the  event  Xeft. 
Consequently,  (4.1)  in  general  implies  that  po(X)  is  multivariate  normal  over 
the  carrier  Q,  and  when  ft=R",  then  po(X)  =  Pn  ( X  I  i-'oi  E)  . 

It  is  informative  to  reexamine  the  structure  of  Theorem  1  when  Q=R"  and 
Po  (  X  )  =Pn  ( X  I  Pot  E)  .  The  translation  family  TTn  is  defined  via 
p  (X  I  8)  =Pn  (X  I  Mo+S,E)  .  The  induced  marginal  distributions  of  Z  =  (x  +  X'B  are 
determined  by  B'B,  as  p*  (  Z  I  6  '  g)  =Pn  (2  I  k  +  ijo  '  6  +  6  '  B  ,  g  '  Eg)  .  The  mean  of  y  can  be 
computed  via  (2.4)  as  E(y)=4)*(e),  or  equivalently  using  the  marginal  density 
of  Z  as 

(4.4)      f(e)  =  E(y)  =  S    F(Z)  p"(Zie'0)dv 
=  **"(e'B) 


The  aggregate  effects  on  E(y)  of  changing  G  can  therefore  be  written  as 
a*'  ./Ml'      WSO  g)  \  /9<|>"   \ 

1/  Vae   /'  U(e  p)/  ^ 


(4.5) 


={ 


38    V3(B'e) 


Now,  since  TTn  is  an  exponential  fanily  with  driving  variable  X,  the  results  of 
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stoker  ( 1982)  imply  that  the  OLS  slope  coefficient  vector  d  o-f  (3.1)  converges 

8  *  *  ( 0 ) 
strongly  to  •r^ ,  which  from  (4.5)  gives  the  result  of  Theorem  1.  The 

earlier  interpretation  of  the  scaling  factor  T  as  the  effect  on  E(y)  of 

varying  r)=E  (Z )  =a  +  i-'o' 0  +  8 '  B  is  obvious  from  (4.5).  Finally,  since  p"(Zie'0)  is 

in  exponential  family  form  with  driving  variable  Z,  another  application  of  the 

same  results  from  Stoker ( 1 982)  indicates  that  T  is  the  a.s.  limit  of  the  OLS 

regression  coefficient  of  yk  on  Zi,  =  a  +  Xk'0. 

It  is  useful  at  this  point  to  discuss  the  results  of  Ruud<1983a),  Deaton 

and  lrish(1984)  and  Chung  and  Gol dberger ( 1984) .  While  Ruud<1983a)  studies 

quasi  maximum  likelihood  estimators  for  binary  discrete  choice  models  and 

Deaton  and  lrish(1984)  and  Chung  and  Gol dberger ( 1984)  employ  a  generalized 

definition  of  the  indicator  Z,  each  paper  utilizes  a  condition  that  E(XIZ)  is 

linear  in  Z; 


(4.6) 


E(X  IZ)  =  G  +  HZ 


where  G  and  H  are  M-vectors.  For  Z=K+X'g,  (4.6)  is  inplied  by  multivariate 
normality  of  X.  The  value  of  condition  (4.6)  is  that  it  makes  X  effectively 
one  dimensional  for  the  purpose  of  calculating  covariances  with  y,  so  that  the 
function  F  inpacts  only  on  a  scalar  covariance.  This  is  easily  seen  from  the 
following  proof,  which  is  basically  Chung  and  Gol dberger ' s ( 1984)  result  for 
censored  model  cases.  Let  y=F(Z),  Sxx,  Exy,  and  Lxz   denote  the  respective 
covariance  matrices  between  X,  y  and  Z,  and  Ozy   and  Cx'   denote  the  respective 
scalar  covariance  values.  Now,  begin  by  expressing  Lxy  as 


(4.7) 


Lxy  =  Eo((X-Po)y)  =  E2(E(X-|JolZ)F(Z)) 
=  H  Ez  ((Z-r>)o)F(Z))  =  H  azy 


where  the  second  equality  follows  fro*  (2.1)  and  the  latter  equalities  follow 


:u 


•from  (4.6).  Note  that  the  value  of  ary  entirely  captures  the  impact  o-f  the 
■function  F.  Noh,  recalling  here  that  d  is  the  OLS  slope  coefficient  vector  of 
y  regressed  on  X,  we  have 


(4.8) 


Hcrz, 


■r^) 


where  the  latter  equality  -follows  from  H=I!xz/o-z'  and  B=  (Exx)  ~' Exz. 

This  argument  was  recalled  in  order  to  indicate  the  identical  role  played 
by  the  linearity  condition  (4.6)  and  density  translation  in  the  normal 
distribution  case.  The  fact  that  the  marginal  densities  of  Z  depend  only  on 
e'B  suffices  to  reduce  the  dimensionality  of  the  aggregate  effects  as  in 
(4.5),  which  IS  exactly  the  impact  of  the  linearity  condition  (4.6)  on  (4,7). 
Moreover,  equality  between  (4.5)  and  (4.8)  gives  an  alternative  demonstration 
that  T  =  (Jzy/az^,  the  large  sample  OLS  slope  regression  coefficient  of  yi,  on 
Zk=ot  +  Xk  B. 

Given  interest  in  conditions  that  depend  only  on  the  marginal 
distribution  of  X,  it  is  natural  to  inquire  how  much  more  general  than 
multivariate  normality  is  the  linearity  condition  (4.6)  with  Z=a+X'B.  We  have 
no  concrete  answer  here,  although  no  obvious  examples  of  nonnormal  densities 
where  (4.6)  is  valid  for  all  a,  0  are  imiaediate.  It  is  true  that  if  the 
individual  components  of  X  are  independent  or  homoscedast i c ,  then  (under  some 
regularity  conditions)  (4.6)  implies  multivariate  normality  (c.f.  Kagan, 
Linnik  and  Rao(1973)),  although  the  implications  of  (4.6)  to  more  general 
circumstances  are  not  known  to  the  author. 

4.2  Mixtures  of  Normals 


An  obvious  circumstance  where  the  X  data  was  nonnormal  would  occur  if  the 
the  sample  distribution  displayed  several  pronounced  modes.  In  this  case,  it 
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might  be  appropriate  to  model  the  X  density  via  a  mixture  of  normals.  We 
indicate  below  that  the  large  sample  limit  o-f  d  in  this  case  is  the 
appropriate  weighted  average  of  the  limits  of  the  OLS  slope  coefficients  that 
would  be  obtained  from  regressions  over  each  of  the  component  normal 
di  str 1 buti ons. 

fill  of  the  intuition  of  this  example  can  be  seen  in  the  case  of  a  two 
component  normal  mixture.  Suppose  that  the  marginal  distribution  of  X  is  given 
as 


(4.9) 


Po(X)  =  Xp, (X)  +  (l-X)p2(X) 


where  pi(X)  =  Pn(XIm>,2i),  Pr(X)  =  Pn(XIp2,I;2)  are  the  normal  component 
densities,  Pi    ¥    Pa  and  0  <  X  <  1.  The  relevant  score  vector  Xo    is 

(4.10)  Jlo  =  -  11'^    P°'^'  =  w(X)E.-MX-|-',)  +  (1-H(X))E2-MX-(J2) 

where  w(X)  =  Xpi  (X ) /po ( X ) .  If  d  is  the  instrumental  variables  slope  vector  of 
(3.1)  using  A-ok  as  instrument,  then  lim  d  =  TP  a.s.  By  direct  computation,  the 
limit  of  d  can  be  written  as 

(4.11)  lim  d  =  XEi-';(X-Mi)yq(y I X)p, (X)dv  +  ( 1-X ) £2"' / ( X-M2) yq (y I  X ) pa ( X ) d v 

=  Xd,  +  (1-X)d2 


where  dj  is  the  large  sample  value  of  the  OLS  coefficient  of  y  on  X  of  (3.1) 
if  X  was  distributed  with  respect  to  the  normal  density  pi(X),  and  d2  is  the 
large  sample  value  of  the  OLS  coefficient  of  y  on  X  if  X  was  distributed  with 
respect  to  the  normal  density  p2(X).  Consequently,  one  can  consider  the  proper 
slope  estimator  in  this  context  as  weighting  together  regression  coefficients 
from  samples  distributed  with  respect  to  each  of  the  component  densities. 
From  Section  4.1  we  have  that  dx=TjB  and  62=^2$,    where  T  =  XTi+(l-X)T2.  This 


is  consistent  with  the  formula  E(y)  =  f"(e'e)  =  \^i"{eB)    +  (1 -X )  <ti2**  ( B  B )  , 
where  <^**,  ^i"    and  +2***  are  the  aggregate  functions  derived  as  in  (4.4)  from 
translation  families  generated  by  po<X),  pi(X)  and  psiX)    respectively. 

Of  course,  separate  OLS  estimates  of  d,  and  da  could  not  in  general  be 
computed  with  observed  data,  because  it  is  not  in  general  possible  to  identify 
which  observations  Xk  were  drawn  individually  from  pi(X)  or  from  pztX).  Also, 
to  compute  d,  estimates  Jlok  of  the  true  score  vectors  have  to  be  constructed, 
which  requires  estimates  of  Mi,  Ma,  2i,  Sz  and  >,.  These  could  be  obtained  as 
the  consistent  roots  of  the  likelihood  equation  for  Xi<,  k  =  l,...,K  implied  by 
(4.9).  Finally,  the  above  weighted  component  regression  interpretation  clearly 
holds  for  the  case  of  a  mixture  of  more  than  two  normal  components. 

4. 5  Elliptical  Distributions 

In  the  same  fashion  as  OLS  slope  estimators  arise  when  independent 
variables  are  multivariate  normally  distributed,  weighted  least  squares 
estiffiators  are  called  for  when  the  independent  variables  are  elliptically 
distributed.  Suppose  that  the  marginal  density  of  X  has  the  elliptical  form 

(4.12)       po(X)  =  p*( -y  (X-Ho)  E-MX-Ho)  ) 

where  p©  =  Eo(X)  and  Lisa  positive  definite  matrix.  Here  the  score  vectors 
take  the  form 


(4.13) 


Ho   =    a)(r  (X)  )  2-'  (X-^o) 


where  r(X)  =  -  (X-Mo)  '  E'MX-po)  is  the  distance  measure  and  tu(r)=-  -r ^ . 

^  or 

The  proper  instrumental  variables  estimator  d  for  proportionately  estimating  B 
is  weighted  least  squares,  where  the  data  for  the  k*"  observation  are  weighted 
by  /cu  ( r  ( X  k )  )  .  In  the  multivariate  normal  case  we  have  u)(r)  =  l  for  all  r.  Note 
in  general  that  Eo(A-o)=0  implies  that  the  weights  u)(r(X))  are  uncorrelated 
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with  X,  however  correlations  with  squares  and  cross  products  of  X  are 
possible.  As  above,  one  would  require  estimates  of  the  parameters  determining 
Po(X),  in  particular  m©  and  E,  in  order  to  estimate  the  proper  weights  ci)(r(X)) 
for  each  observation  X^. 

4. 4  Multivariate  Lognormal  Distributions 

Other  cases  where  X  is  distributed  in  a  nonnormal  -fashion  occur  when  the 
sample  distribution  of  X  is  skewed,  as  one  would  expect  for  variables 
measuring  income  or  other  wealth  components.  As  a  final  example  we  consider 
the  case  where  X  is  lognormally  distributed;  namely  where  In(X-X)  is 
distributed  as  a  multivariate  normal  vector  with  mean  m*  and  covariance  matrix 
E*,  with  X  is  a  vector  of  constants.  Here  Q   is  defined  as  the  set  0  =  {XIX>X}, 
with  the  standard  definition  of  the  lognorsial  density  augmented  by  setting  it 
to  zero  for  XedQ. 

By  direct  computation,  the  appropriate  score  vector  for  this  case  is 
given  as 


(4. 14) 


Jlo  =  [diag(X-X)  ]-'  [  l  +  (  E»)  "M  1  n  (  X-X) -M»)  3 


where  diag(X-X)  is  the  diagonal  aatrix  with  i''"  diagonal  element  Xj-Xi.  To 
construct  the  vectors  A,ok,  one  would  evaluate  (4.10)  at  Xi,  and  consistent 
estimates  of  X,  h*  and  E*. 

This  example  points  out  the  close  connection  between  the  proper  score 
vectors  and  the  specification  of  the  index  Z  in  the  behavioral  equation  (2.1). 
The  proper  score  function  is  given  by  (4.14)  when  X  is  the  correct 
specification  of  variables  in  the  index  Z=a+X'B.  If,  alternatively,  we  set  X=0 
above,  and  the  index  Z  were  defined  as  Z=o<  +  l  n  ( X)  '  6 ,  then  the  results  of 
Section  4.1  would  apply,  with  the  proper  estimator  the  OLS  slope  coefficients 


of  Vk  regressed  on  InlX^).  While  in  many  applications  the  precise  -form  of  the 

index  Z  may  not  significantly  affect  the  coefficient  ratio  estimates,    it  is 

important  for  the  correct  application  of  our  results. 

5.  Summary  and  Conclusion 


In  this  paper  a  linear  instrumental  variables  estimator  d  is  proposed  for 
estimating  the  ratio  of  coefficients  in  single  index  models.  The  framework  is 
illustrated  by  several  common  examples  of  limited  dependent  variables  models, 
as  well  as  models  involving  a  transformed  dependent  variable.  Similar 
estimators  are  indicated  for  multiple  index  models,  and  models  where 
extraneous  variables  are  present.  The  construction  of  the  instrumental 
variables  is  discussed,  and  illustrated  by  several  examples  of  specific 
independent  variable  distributions.  The  asymptotic  distribution  of  d  is 
established  for  purposes  of  statistical  inference. 

There  sre   two  major  advantages  to  the  proposed  estimator  d.  First,  d  is 
nonparametr 1 c  to  the  extent  that  it  is  robust  to  aany  specific  functional  form 
and  stochastic  distribution  assumptions.  If  a  particular  application  requires 

A 

only  estimates  of  the  ratios  of  components  of  0,  then  d  will  suffice.  Scale 

A, 

free  hypotheses  on  g  can  be  tested  using  d.  Moreover,  in  a  general  application 
where  different  sets  of  modeling  assuaptions  produce  substantively  different 
estinated  parameter  values,  d  will  provide  useful  information  for  choosing  the 
best  specification. 

A 

The  other  major  advantage  in  using  d  is  that  it  is  a  linear  estimator, 
once  the  instruments  are  computed.  Consequently,  once  the  distribution  of  the 
independent  variables  is  characterized,  the  computation  of  d  is  easy  and 
relatively  inexpensive,  particularly  for  large  data  bases. 

There  are  also  two  drawbacks  to  the  results.  First,  to  construct  the 

proper  instruments,  the  distribution  of  the  independent  variables  must  be 
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modeled,  and  the  score  vectors  derived  from  the  assumed  density.  This  problem 
can  be  overcome  by  further  research  on  nonpar ametric  estimation  of 
Bultivariate  score  vectors,  which  given  the  current  state  of  work  on  adaptive 
estimation,  appears  very  promising.  The  second  drawback  is  that  our  results 
apply  only  to  estimating  the  coefficients  of  continously  distributed 
variables,  but  most  serious  applications  to  mi croeconomi c  data  will  require 
using  discrete  as  well  as  continuous  independent  variables.  While  we  have 
indicated  above  how  discrete  variables  can  be  accomodated  in  the  estimation  of 
continuous  variable  coefficients,  the  question  of  how  to  nonparametr i call y 
estimate  discrete  variable  coefficients  up  to  scale  remains  open. 


Append!  >: ;  Further  Regularity  Assumptions 

For  the  purpose  o-f  differentiating  under  integral  signs,  define 
difference  quotients  as 


.^,y,,,„  ._    yq(ylX)[po(X-he,)-po(X)] 

n 


D.,(x,h)  =  X,Cpoa-he,)-po(X)3 


for  i,j=l,...,M,  where  Xi  is  the  i*^  component  of  X,  ej  is  the  unit  vector 
with  j*^^  component  1  and  h  is  a  scalar.  We  now  make 

Assumption  5;  There  exists  v-integrable  functions  gvj(y,X)  and  gij(X)  for 
i,j=l,...,M  such  that  for  all  h  where  0  <  Ihl  <  ho, 

IDyj(y,X,h)l  <  gyj(y,X) 

I  D,  J (X,h)  I  <  gi J  (X) 
for  all  i  ,  j  =  l ,  .  .  .  ,f1. 

For  the  purpose  of  using  estimated  parameters  to  construct  the  score 
vector  instruments,  define 


A,o(A) 


31n  p* (X  I  A) 
dX 


Denote  the  j"'  component  of  A,o(A)  as  Jloj(A)  and  assume 

Assumption  6;  p"(XIA)  is  twice  di f f erenti able  with  respect  to  the  components 
of  A  in  an  open  neighborhood  of  A=Ao.  There  exists  measurable  functions 
6vj(y,X)  and  6ij(X),  i,j=l,...,M  such  that 

I  yJloj(A)  I  <  6yj(y,X) 
I  Xjl-oj  (A)  I  <  Gi  j(X) 


27 


.1+T 


1+T 


for  all  A  in  an  open  neighborhood  d    Ao,  where  Eo(Gyj)     and  Eo(Gij)     are 
bounded  for  some  x>0,  i,j=l,...,M. 

A  sufficient  set  of  conditions  for  establishing  the  asymptotic 
distribution  of  instrumental  variables  slope  estimators  is  given  as 

Assumption  IV:  The  means  and  covariance  matrices  of  y,  X  and  W  exist,  and  the 
covanance  matri;-;  Ewx  =  E  [  ( W-E  ( W)  )  (  X-E  (  X  )  )  '  ]  is  nonsingular.  For 
Uw= (y-E (y ) ) - ( X-E ( X ) ) ' 6w,  the  covariance  matrix  of  (W-E(W))Uw  exists. 

For  deriving  the  asymptotic  distribution  of  the  specific  estimator  of  this 
paper,  we  require 

Assumption  7:  For  Uo= < y~Eo (y ) ) - ( X-Eo (X ) ) ' Tg  ,  the  covariance  matrix  of  JloUo 
exists. 
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Footnotes 


1.  The  sensitivity  of  estimates  to  specific  stochastic  distribution 
assumptions  in  certain  limited  dependent  variable  contexts  is  well  known.  For 
example,  Heckman  and  Singer(19S4)  illustrate  such  sensitivity  for  duration 
models,  and  establish  an  approach  based  on  nonpararoetr i cal 1 y  estimating  the 
stochastic  heterogeneity  distribution. 

2.  See  also  Greene (19B1 ,  1983)  ,  Lawley(1943)  and  Stewart  ( 1983) . 

3.  Ruud(19B3b)  studies  a  similar  estimation  problem  and  proposes  a  different 
technique. 

4.  The  behavioral  modeling  framework  of  Deaton  and  lrish(1984)  and  Chung  and 
Goldberger ( 1984)  is  slightly  different  to  that  considered  here,  since  it 
subsumes  situations  where  e  (our  notation)  is  uncorrelated  with  X,  but 
possibly  not  independent. 

5.  Man5ki(1975)  presents  an  alternative  nonpar ametri c  method  of  estimating 
both  0!  and  g  for  discrete  choice  models. 

6.  Note  that  Assumption  3  requires  that  F(o(+X'B)  is  defined  over  the  set 
ft(8)  =  {X  +  81  XeiJ,eeB}. 

7.  There  are  a  number  of  regression  estimators  that  measure  the  effects  of 
discrete  variables,  however  none  appear  to  estimate  the  coefficients  of 
discrete  variables  up  to  the  same  scalar  multiple  as  applicable  to  the 
continuous  coefficients.  For  example,  suppose  that  Xa  is  a  single  discrete 
variable  taking  the  values  0  and  1,  and  the  behavioral  model  implies  that 

E  (y  I  Xi  ,  Xz)  =F  (cx  +  Xi  '  B  1  +  X2P2)  .  The  joint  density  of  Xi  and  X2  can  be  written  as 


PoUj.Xa)  =  (l-X)p'='(X,) 


If   X2=0 


=  Xp»  (X,) 


if  X2=l 


where  X  is  the  probability  that  X2=l  and  p-"  is  the  conditional  density  of  Xi 
given  that  X2=j.  Now  suppose  that  one  estimates  the  equation 


yk  =  c  +  Xiw'dj  +  X: 


+  u. 


A      A 

using  instruments  (1  ,iloi. ,  A,di. )  ,  where  Jld  =  Sin  po  (X,  ,  X2)  /  3X  ,  so  that  (dj',d2) 
is  an  estimator  of  the  macroeconomi c  effects  of  varying  E(Xi)  and  E(X2)  on 
E(y).  It  is  easy  to  show  that  lim  di=TjBi  and  that 

lim  92=Eo (y I X2=l ) -Eo <y I X2=0) .  While  d2  is  a  measure  of  the  impact  of  the 
discrete  variable  X2,  the  conditions  under  which  lim  d2=TiP2  appear  to  involve 
severe  restrictions  on  the  structure  of  the  function  F. 


8.  For  instance,  in  the  selection  model  of  Example  4,  if  a  variable  Xi  was 
contained  in  both  Xi  and  X2,  then  its  coefficient  di  from  (3.1)  will 
consistently  estimate  0it+T202i,  the  structural  coefficient  plus  a  selection 
term. 


9.  Man5ki(1984)  also  proposes  similar  work  on  multivariate  extensions.  It 
should  be  noted  that  the  nonpar  ametri  c  estimation  o-f  JLok  called  -for  in  the 
present  paper  is  not  as  demanding  as  that  proposed  by  Manski,  because  the  X 
data  is  observed. 

10.  Moreover,  V*  is  just  the  "heteroscedast i ci ty  consistent"  variance  estimator 
of  White(1980). 

11.  On  this  point,  see  the  discussion  in  Deaton  and  Irish(1984). 
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